TELLTALE: Experiments in a Dynamic Hypertext Environment for Degraded and Multilingual Data

نویسندگان

  • Claudia Pearce
  • Charles K. Nicholas
چکیده

Methods and tools for finding documents relevant to a user’s needs in document corpora can be found in the information retrieval, library science, and hypertext communities. Typically, these systems provide retrieval capabilities for fairly static corpora, their algorithms are dependent on the language for which they are written, e.g. English, and they do not perform well when presented with misspelled words or text that has been degraded by OCR (optical character recognition) techniques. In this article, we present experimentation results for the TELLTALE system. TELLTALE is a dynamic hypertext environment that provides full-text search from a hypertext-style user interface for text corpora that may be garbled by OCR or transmission errors, and that may contain languages other than English. TELLTALE uses several techniques based on ngrams (n character sequences of text). With these results we show that the dynamic linkage mechanisms in TELLTALE are tolerant of garbles in up to 30% of the characters in the body of the text.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The TELLTALE Dynamic Hypertext Environment: Approaches to Scalability

Methods and tools for nding documents relevant to a user's needs in document corpora can be found in the information retrieval, library science, and hypertext communities. Typically, these systems provide retrieval capabilities for fairly static corpora, their algorithms are dependent on the language for which they are written, e.g. English, and they don't perform well when presented with missp...

متن کامل

The TELLTALE Dynamic Hypertext Environment : Approaches to

Methods and tools for nding documents relevant to a user's needs in document corpora can be found in the information retrieval, library science, and hypertext communities. Typically, these systems provide retrieval capabilities for fairly static corpora, their algorithms are dependent on the language for which they are written, e.g. English, and they don't perform well when presented with missp...

متن کامل

Performance and Scalability of a Large-Scale N-gram Based Information Retrieval System

Information retrieval has become more and more important due to the rapid growth of all kinds of information. However, there are few suitable systems available. This paper presents a few approaches that enable large-scale information retrieval for the TELLTALE system. TELLTALE is a dynamic hypertext information retrieval environment. It provides full-text search for text corpora that may be gar...

متن کامل

Optimal overhaul–replacement policy for a multi-degraded repairable system sold with warranty

In this research, we study an optimal overhaul–replacement policy of a multi-degraded repairable system sold with a free replacement warranty. In the proposed replacement policy, a maintenance action and failure are dependent on a system degradation level and the system age, and hence the replacement model will provide more effective maintenance decisions. Failure of the system is modeled using...

متن کامل

Safety Assessment of cryIAc for human, animals and environment

Risk assessment of a transgene is one of the key steps in genetic transformation. Hence, in order to use cryIAc gene for production of transgenic plants, a library and in-silico research was performed to confirm safety of the gene for human comsumption, animal feed and environment. In the first step, the molecular mechanism of action of the CryIAc protein and its specific receptors in the midgu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JASIS

دوره 47  شماره 

صفحات  -

تاریخ انتشار 1996